This tutorial describes the steps required to create a customised invoice processing solution.
The manual management of data extraction from large volumes of invoices (of varying formats) can be a daunting, error-prone, and time-consuming task, and can lead to delays in processing, and missed opportunities.
This guide demonstrates how the extraction of both structured and unstructured data from different types of invoices can be automated to streamline inventory management processes. By leveraging the XtractFlow SDK, accurate inventory records can be maintained, informed procurement decisions can be made, and supply chain operations can be optimized.
-> Check the Prerequisites page.
A document template serves as a comprehensive definition for a specific type of document. XtractFlow comes with a set of predefined templates for different document types, one of which is an Invoice.
The Invoice template has the following predefined fields, which covers the most common fields in Invoices:
For this use case, 4 additional fields are also required:
To set up the document template for the Invoice, begin by obtaining the predefined Invoice template. Then, add the 4 new fields to it by providing a clear semantic description for each field.
XtractFlow Document template configuration in csharp |
Copy Code |
---|---|
static DocumentTemplate buildInvoiceTemplate() { // Template setup: getting an instance of the preconfigured Invoice template. DocumentTemplate invoiceTemplate = DocumentTemplates.Invoice; // Adding custom field into the Invoice template instance // to get Order ID of a specific format invoiceTemplate.AddField(new() { Name = "Order ID", Format = FieldDataFormat.Text, SemanticDescription = "The order ID in the invoice", RegexValidationMethods = new List<RegexFieldValidationMethod> { new RegexFieldValidationMethod("^[A-Z][0-9]{1,6}$") } }); // Adding custom field into the Invoice template instance // to calculate the payment due date based on the information in the invoice invoiceTemplate.AddField(new() { Name = "Payment due date", Format = FieldDataFormat.Text, SemanticDescription = "The date that the payment is due", StandardValidationMethods = new List<StandardFieldValidationMethod>() { new StandardFieldValidationMethod(StandardFieldValidation.DateIntegrity) } }); // Adding custom field into the template instance // to get the total number of unique item count invoiceTemplate.AddField(new() { Name = "Unique item count", Format = FieldDataFormat.Number, SemanticDescription = "The total number of unique items", StandardValidationMethods = new List<StandardFieldValidationMethod>() { new StandardFieldValidationMethod(StandardFieldValidation.NumberIntegrity) } }); // Adding custom field into the template instance // to get the sum total of all items invoiceTemplate.AddField(new() { Name = "Total item count", Format = FieldDataFormat.Number, SemanticDescription = "The sum of all items, including multiplying the quantities of each item", StandardValidationMethods = new List<StandardFieldValidationMethod>() { new StandardFieldValidationMethod(StandardFieldValidation.NumberIntegrity) } }); return invoiceTemplate; } |
Create a ProcessorComponent object, which is a necessary component for the processor. This object will encapsulate the document processing workflow's logic.
Copy Code | |
---|---|
static ProcessorComponent buildComponent() { return new ProcessorComponent() { EnableClassifier = false, // Classification is not required as we are using a single template EnableFieldsExtraction = true, // Enabling extraction of fields specified from the templates defined in the "Templates" field below. Templates = new DocumentTemplate[] { buildInvoiceTemplate() } }; } |
At this point, it is necessary to instantiate a DocumentProcessor object and invoke the Process method to initiate the inference process.
Subsequently, a ProcessorResult object will be returned, encompassing the processing outcome.
Copy Code | |
---|---|
// Process the document ProcessorResult result = new DocumentProcessor().Process(sourceFile, component); // Analyse results if (result.ExtractedFields != null) { foreach (var item in result.ExtractedFields) { Console.WriteLine($"Field name: {item.FieldName} | Field value: {item.Value} | Validation state: ({item.ValidationState})"); } } |
Obtained results:
Field name: 'Invoice number' | Field value: '1000876' | Validation state: (Undefined) Field name: 'Date emission' | Field value: '14/08/2023' | Validation state: (Valid) Field name: 'Due date' | Field value: '13/09/2023' | Validation state: (Valid) Field name: 'Customer name' | Field value: 'Roger COMPANY' | Validation state: (Undefined) Field name: 'Customer address' | Field value: '100 Mighty Bay, 125863 Rome, IT' | Validation state: (Valid) Field name: 'Vendor name' | Field value: 'Rabbit STORE' | Validation state: (Undefined) Field name: 'Vendor address' | Field value: '255 Commercial Street, 25880 New York, US' | Validation state: (Valid) Field name: 'Total VAT excluded' | Field value: '1750' | Validation state: (Valid) Field name: 'Total VAT included' | Field value: '1925' | Validation state: (Valid) Field name: 'VAT percentage' | Field value: '10' | Validation state: (Undefined) Field name: 'VAT amount' | Field value: '175' | Validation state: (Valid) Field name: 'Currency' | Field value: 'USD' | Validation state: (Valid) Field name: 'Order ID' | Field value: 'X001525' | Validation state: (Valid) Field name: 'Payment due date' | Field value: '2023-09-13' | Validation state: (Valid) Field name: 'Unique item count' | Field value: '4' | Validation state: (Valid) Field name: 'Total item count' | Field value: '17' | Validation state: (Valid)
Using XtractFlow to achieve custom data extraction |
Copy Code |
---|---|
static void RunExtraction() { Configuration.RegisterGdPictureKey("GDPICTURE_KEY"); Configuration.RegisterLLMProvider(new OpenAIProvider("OPENAI_API_KEY")); Configuration.ResourcesFolder = "resources"; // building the component ProcessorComponent component = buildComponent(); // Process the document ProcessorResult result = new DocumentProcessor().Process("invoice.pdf", component); // Analyse results if (result.ExtractedFields != null) { foreach (var item in result.ExtractedFields) { Console.WriteLine($"Field name: {item.FieldName} | Field value: {item.Value} | Validation state: ({item.ValidationState})"); } } } static ProcessorComponent buildComponent() { return new ProcessorComponent() { EnableClassifier = false, // Classification is not required as we are using a single template EnableFieldsExtraction = true, // Enabling extraction of fields specified from the templates defined in the "Templates" field below. Templates = new DocumentTemplate[] { buildInvoiceTemplate() } }; } static DocumentTemplate buildInvoiceTemplate() { // Template setup: getting an instance of the preconfigured Invoice template. DocumentTemplate invoiceTemplate = DocumentTemplates.Invoice; // Adding custom field into the Invoice template instance // to get Order ID of a specific format invoiceTemplate.AddField(new() { Name = "Order ID", Format = FieldDataFormat.Text, SemanticDescription = "The order ID in the invoice", RegexValidationMethods = new List<RegexFieldValidationMethod> { new RegexFieldValidationMethod("^[A-Z][0-9]{1,6}$") } }); // Adding custom field into the Invoice template instance // to calculate the payment due date based on the information in the invoice invoiceTemplate.AddField(new() { Name = "Payment due date", Format = FieldDataFormat.Text, SemanticDescription = "The date that the payment is due", StandardValidationMethods = new List<StandardFieldValidationMethod>() { new StandardFieldValidationMethod(StandardFieldValidation.DateIntegrity) } }); // Adding custom field into the template instance // to get the total number of unique item count invoiceTemplate.AddField(new() { Name = "Unique item count", Format = FieldDataFormat.Number, SemanticDescription = "The total number of unique item", StandardValidationMethods = new List<StandardFieldValidationMethod>() { new StandardFieldValidationMethod(StandardFieldValidation.NumberIntegrity) } }); // Adding custom field into the template instance // to get the sum total of all items invoiceTemplate.AddField(new() { Name = "Total item count", Format = FieldDataFormat.Number, SemanticDescription = "The sum of all items, including multiplying the quantities of each item", StandardValidationMethods = new List<StandardFieldValidationMethod>() { new StandardFieldValidationMethod(StandardFieldValidation.NumberIntegrity) } }); return invoiceTemplate; } |